Search CORE

100 research outputs found

TallyQA: Answering Complex Counting Questions

Author: Acharya Manoj
Kafle Kushal
Kanan Christopher
Publication venue
Publication date: 31/10/2018
Field of study

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.Comment: To appear in AAAI 2019 ( To download the dataset please go to http://www.manojacharya.com/

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Answer-Type Prediction for Visual Question Answering

Author: Kafle Kushal
Kanan Christopher
Publication venue: RIT Scholar Works
Publication date: 01/06/2016
Field of study

Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach’s key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W

RIT Scholar Works